Search CORE

18 research outputs found

Machine Learning for Software Fault Detection : Issues and Possible Solutions

Author: Lomio Francesco
Publication venue: Tampere University
Publication date: 22/06/2022
Field of study

Viime vuosina tekoälyn ja etenkin kone- ja syväoppimisen tutkimus on menestynyt osittain uusien teknologioiden ja laitteiston kehityksen vuoksi. Tutkimusalan uudelleen alkanut nousu on saanut monet tutkijat käyttämään kone- ja syväoppimismalleja sekä -tekniikoita ohjelmistotuotannon alalla, johon myös ohjelmiston laatu sisältyy. Tässä väitöskirjassa tutkitaan ohjelmistovirheiden tunnistukseen tarkoitettujen koneoppimismallien suorituskykyä kolmelta kannalta. Ensin pyritään määrittämään parhaiten ongelmaan soveltuvat mallit. Toiseksi käytetyistä malleista etsitään ohjelmistovirheiden tunnistusta heikentäviä yhtäläisyyksiä. Lopuksi ehdotetaan mahdollisia ratkaisuja löydettyihin ongelmiin. Koneoppimismallien suorituskyvyn analysointi paljasti kaksi pääongelmaa: datan epäsymmetrisyys ja aikariippuvuus. Näiden ratkaisemiseksi testattiin useita tekniikoita: ohjelmistovirheiden käsittely anomalioina, keinotekoisesti uusien näytteiden luominen datan epäsymmetrisyyden korjaamiseksi sekä jokaisen näytteen historian huomioivien syväoppimismallien kokeilu aikariippuvuusongelman ratkaisemiseksi. Ohjelmistovirheet havaittiin merkittävästi paremmin käyttämällä dataa tasapainottavia ylinäytteistämistekniikoita sekä aikasarjaluokitteluun tarkoitettuja syväoppimismalleja. Tulokset tuovat selvyyttä ohjelmistovirheiden ennustamiseen koneoppimismenetelmillä liittyviin ongelmiin. Ne osoittavat, että ohjelmistojen laadun tarkkailussa käytettävän datan aikariippuvuus tulisi ottaa huomioon, mikä vaatii etenkin tutkijoiden huomiota. Lisäksi ohjelmistovirheiden tarkempi havaitseminen voisi auttaa ammatinharjoittajia parantamaan ohjelmistojen laatua. Tulevaisuudessa tulisi tutkia kehittyneempien syväoppimismallien soveltamista. Tämä kattaa uusien metriikoiden sisällyttämisen ennustaviin malleihin, sekä kehittyneempien ja paremmin datan aikariippuvuuden huomioon ottavien aikasarjatyökalujen hyödyntämisen.Over the past years, thanks to the availability of new technologies and advanced hardware, the research on artiﬁcial intelligence, more speciﬁcally machine and deep learning, has ﬂourished. This newly found interest has led many researchers to start applying machine and deep learning techniques also in the ﬁeld of software engineering, including in the domain of software quality. In this thesis, we investigate the performance of machine learning models for the detection of software faults with a threefold purpose. First of all, we aim at establishing which are the most suitable models to use, secondly we aim at ﬁnding the common issues which prevent commonly used models from performing well in the detection of software faults. Finally, we propose possible solutions to these issues. The analysis of the performance of the machine learning models highlighted two main issues: the unbalanced data, and the time dependency within the data. To address these issues, we tested multiple techniques: treating the faults as anomalies and artiﬁcially generating more samples for solving the unbalanced data problem; the use of deep learning models that take into account the history of each data sample to solve the time dependency issue. We found that using oversampling techniques to balance the data, and using deep learning models speciﬁc for time series classiﬁcation substantially improve the detection of software faults. The results shed some light on the issues related to machine learning for the prediction of software faults. These results indicate a need to consider the time dependency of the data used in software quality, which needs more attention from researchers. Also, improving the detection performance of software faults could help the practitioners to improve the quality of their software. In the future, more advanced deep learning models can be investigated. This includes the use of other metrics as predictors and the use of more advanced time series analysis tools for better taking into account the time dependency of the data

Trepo - Institutional Repository of Tampere University

Classification of Building Information Model (BIM) Structures with Deep Learning

Author: Farinha Ricardo
Huttunen Heikki
Laasonen Mauri
Lomio Francesco
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2018
Field of study

In this work we study an application of machine learning to the construction industry and we use classical and modern machine learning methods to categorize images of building designs into three classes: Apartment building, Industrial building or Other. No real images are used, but only images extracted from Building Information Model (BIM) software, as these are used by the construction industry to store building designs. For this task, we compared four different methods: the first is based on classical machine learning, where Histogram of Oriented Gradients (HOG) was used for feature extraction and a Support Vector Machine (SVM) for classification; the other three methods are based on deep learning, covering common pre-trained networks as well as ones designed from scratch. To validate the accuracy of the models, a database of 240 images was used. The accuracy achieved is 57% for the HOG + SVM model, and above 89% for the neural networks.Comment: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl

arXiv.org e-Print Archive

Crossref

A machine and deep learning analysis among SonarQube rules, product, and process metrics for fault prediction

Author: Lenarduzzi Valentina
Lomio Francesco
Moreschini Sergio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

Background: Developers spend more time fixing bugs refactoring the code to increase the maintainability than developing new features. Researchers investigated the code quality impact on fault-proneness, focusing on code smells and code metrics. Objective: We aim at advancing fault-inducing commit prediction using different variables, such as SonarQube rules, product, process metrics, and adopting different techniques. Method: We designed and conducted an empirical study among 29 Java projects analyzed with SonarQube and SZZ algorithm to identify fault-inducing and fault-fixing commits, computing different product and process metrics. Moreover, we investigated fault-proneness using different Machine and Deep Learning models. Results: We analyzed 58,125 commits containing 33,865 faults and infected by more than 174 SonarQube rules violated 1.8M times, on which 48 software product and process metrics were calculated. Results clearly identified a set of features that provided a highly accurate fault prediction (more than 95% AUC). Regarding the performance of the classifiers, Deep Learning provided a higher accuracy compared with Machine Learning models. Conclusion: Future works might investigate whether other static analysis tools, such as FindBugs or Checkstyle, can provide similar or different results. Moreover, researchers might consider the adoption of time series analysis and anomaly detection techniques.publishedVersionPeer reviewe

University of Oulu Repository - Jultika

Trepo - Institutional Repository of Tampere University

Does migrating a monolithic system to microservices decrease the technical debt?

Author: Lenarduzzi Valentina
Lomio Francesco
Saarimäki Nyyti
Taibi Davide
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

Background: The migration from a monolithic system to microservices requires a deep refactoring of the system. Therefore, such a migration usually has a big economic impact and companies tend to postpone several activities during this process, mainly to speed up the migration itself, but also because of the demand for releasing new features. Objective: We monitored the technical debt of an SME while it migrated from a legacy monolithic system to an ecosystem of microservices. Our goal was to analyze changes in the code technical debt before and after the migration to microservices. Method: We conducted a case study analyzing more than four years of the history of a twelve-year-old project (280K Lines of Code) where two teams extracted five business processes from the monolithic system as microservices. For the study, we first analyzed the technical debt with SonarQube and then performed a qualitative study with company members to understand the perceived quality of the system and the motivation for possibly postponed activities. Results: The migration to microservices helped to reduce the technical debt in the long run. Despite an initial spike in the technical debt due to the development of the new microservice, after a relatively short period of time the technical debt tended to grow slower than in the monolithic system.acceptedVersionPeer reviewe

Trepo - Institutional Repository of Tampere University

PANDORA: Continuous Mining Software Repository and Dataset Generation

Author: Lenarduzzi Valentina
Lomio Francesco
Nguyen Hung
Pecorelli Fabiano
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

acceptedVersionPeer reviewe

University of Oulu Repository - Jultika

Trepo - Institutional Repository of Tampere University

Are SonarQube Rules Inducing Bugs?

Author: Huttunen Heikki
Lenarduzzi Valentina
Lomio Francesco
Taibi Davide
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/12/2019
Field of study

The popularity of tools for analyzing Technical Debt, and particularly the popularity of SonarQube, is increasing rapidly. SonarQube proposes a set of coding rules, which represent something wrong in the code that will soon be reflected in a fault or will increase maintenance effort. However, our local companies were not confident in the usefulness of the rules proposed by SonarQube and contracted us to investigate the fault-proneness of these rules. In this work we aim at understanding which SonarQube rules are actually fault-prone and to understand which machine learning models can be adopted to accurately identify fault-prone rules. We designed and conducted an empirical study on 21 well-known mature open-source projects. We applied the SZZ algorithm to label the fault-inducing commits. We analyzed the fault-proneness by comparing the classification power of seven machine learning models. Among the 202 rules defined for Java by SonarQube, only 25 can be considered to have relatively low fault-proneness. Moreover, violations considered as 'bugs' by SonarQube were generally not fault-prone and, consequently, the fault-prediction power of the model proposed by SonarQube is extremely low. The rules applied by SonarQube for calculating technical debt should be thoroughly investigated and their harmfulness needs to be further confirmed. Therefore, companies should carefully consider which rules they really need to apply, especially if their goal is to reduce fault-proneness.acceptedVersionPeer reviewe

arXiv.org e-Print Archive

Trepo - Institutional Repository of Tampere University

MLOps for evolvable AI intensive software systems

Author: Hästbacka David
Lomio Francesco
Moreschini Sergio
Taibi Davide
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/07/2022
Field of study

acceptedVersionPeer reviewe

Trepo - Institutional Repository of Tampere University

On The Benefits of the Accelerate Metrics: An Industrial Survey at Vendasta

Author: Birch Dale
Codabux Zadia
Hopkins Dale
Lomio Francesco
Taibi Davide
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/03/2022
Field of study

acceptedVersionPeer reviewe

Trepo - Institutional Repository of Tampere University